Why bother?

The best way to learn coding is to

  • Choose the right first language. Python didn’t work for me in the beginning.

  • Try a lot of EDA(explanatory data analysis).

  • No need to hurry. You probably don’t know what you don’t know because you don’t have to.

These slides cannot cover everything you can do with Git. We will only cover version controls, collaboration, and GitHub pages. But I did bring a gift in the last slide:)

What is Git?

Git is a distributed version control system that tracks versions of files. It is often used to control source code by programmers who are developing software collaboratively.

  • Version Control
  • Backup
  • Collaboration


What is Git?

Examples in Economic Research


What is GitHub?

Git creates a detailed history of your directory(folder). GitHub is where you post it, or where you download resources from. You can share your work or copy others’ work.

Setting up

for Windows

  1. Go to https://git-scm.com.
  2. Click [Download 2.xx.x for Windows].
  3. Set all options to default.
  4. Open Git Bash in Windows toolbar.


for Mac

  1. Go to https://git-scm.com.
  2. Click [Download 2.xx.x for Mac].
  3. Open the package and install Git. After installation, you can trash the installator package.
  4. Open terminal with Cmd + spacebar.

If Git is properly installed, writing git --version in the terminal(shell) would display the following:

git --version
git version 2.39.5 (Apple Git-154)

Setting Up

  1. Set your user.name and user.email to the ones you have in your GitHub account.
git config --global user.name "wyeconomics"
git config --global user.email "tommypark822@naver.com"

Single(-) and double dash(--) stand for options.

  1. Check your current working directory with pwd, set it to Desktop if it isn’t, and create a practice directory.
pwd
cd Desktop
mkdir practice

Usually a project directory would have multiple subdirectories including codes, doc, data, figures, and temp.

Tip) cd .. leads to the outer folder containing current working directory.

  1. Once the main structure of a directory is made, initialize git with git init, and check it with Cmd+Shift+.
cd practice
git init

Tipbox
Directory vs Repository

These two are near identical in normal use. Locally, it is the folder of interest in a computer. However, repositories are directories with Git activated. Thus, folders in GitHub is mostly called a repo than a directory as it always entails Git.

Adding Files

REAEME file

Most projects include README files to explain the content of the project and what’s in the repo.

vim README.txt

Write “This is a practice repo of Git”, and close it with Ctrl(Cmd) + X and enter.

Tipbox
Terminal vs Shell

You might have come across these two words. Put simply, shell is a program that covers the core of the computer. In other words, it is a program, a translator between you and the computer, that touches the core functions of a computer. Shell could be incarnated in a Command Line Interface(CLI) or Graphical User Interface(GUI). But usually, when we say shell, it refers to CLIs. Terminal is the interface program where you can interact with the shell. You can consider it a imitated CLI on a GUI - or more accurately, a TUI. - An eptiome og GUI is the Windows File Explorer.

Adding Files

html output with Jupyter Notebook

Let’s create a html output with either Python(.ipynb) or R(.rmd). Save it as output.html in the doc subdirectory.

# Practice with the `penguins` dataset

## Cell one : loading data
import seaborn as sns
import pandas as pd

df = sns.load_dataset('penguins')
df.head(3)
  species     island  bill_length_mm  ...  flipper_length_mm  body_mass_g     sex
0  Adelie  Torgersen            39.1  ...              181.0       3750.0    Male
1  Adelie  Torgersen            39.5  ...              186.0       3800.0  Female
2  Adelie  Torgersen            40.3  ...              195.0       3250.0  Female

[3 rows x 7 columns]
## Cell two : processing data
df = df.dropna(subset = ['sex', 'body_mass_g'], how = 'any', axis = 0)

### Cell three : plotting figure
import matplotlib.pyplot as plt

fig = plt.figure(figsize = (9,5))
ax1 = fig.add_subplot(1,2,1)
ax2 = fig.add_subplot(1,2,2)
df.plot(kind='scatter', x='flipper_length_mm', y='body_mass_g', c='coral', ax=ax1)
sns.regplot(x='flipper_length_mm', y='body_mass_g', data=df, scatter_kws={"alpha": 0.5}, ax=ax2)
plt.show()

# Cell one: Load required libraries
library(tidyverse)
library(palmerpenguins)
library(patchwork)

# Cell two: Load dataset
df <- penguins

head(df, 3)
# A tibble: 3 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
# ℹ 2 more variables: sex <fct>, year <int>
# Cell three: Process data
df <- df %>% filter(!is.na(sex), !is.na(body_mass_g))

# Cell four: Plotting figure
p1 <- ggplot(df, aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point(color = "coral") +
  labs(title = "Scatter Plot")
p2 <- ggplot(df, aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm") +
  labs(title = "Regression Plot")
p1 + p2

Git Workflow

Image from Dev Community

Staging and Commiting

git add . #to add a particular file, git add filename
git commit -m "first commit"

Then, all the files are now recorded in part of your commitment history. Check git log.
In git log, hash is the ID specific to a particular commit.

git log

Make any slight modification to the README.txt file, and let’s check if git spots the difference.

git diff README.txt


Tip) Try to leave the staging area clean.

Checking History

git log can include multiple options.

git log -3 # restricts the number of commits to three (most recent)
git log -3 README.txt # displays log of only README.txt
git log --since='Month Day Year' # displays only the commits made after the specified date
git log --since='M D Y' --until='M D Y' # displays only the commits made after and until the specified date


Undoing Changes

  • unstaging a file : git reset HEAD or git reset HEAD file_name

Why call HEAD?

You might wonder why we have to call HEAD when HEAD refers to the last commit. This is because unstaging a file actually means matching the staging area to the last commit. Technically, unstaging is not emptying the staging area, but matching the staging area to the last commit.

  • Undoing changes to an unstaged file : git checkout . or git checkout -- file_name
  • Reverting Commits : git revert HEAD or git revert hash

What is Reverting?

Sometimes, we commit files that contains an error, and then spot the issue. Naturally, we need a command of restoring a repo to the state before the previous commit and make a new commit with the previous version. This is what git revert does. Also, the git revert command opens a text editor in the shell to add a commit message. However, it is also a command that often generates conflict. I do not recommend git revert unless reverting is inevitable. If the command sounds less intuitive, ignore it. Totally fine.

Pushing a Repo

Let’s push the repo and the html file we made to our GitHub account.

  1. Click New in the top-left banner.
  2. Fill the Repository Name* with the local repo name.
  3. Set options to Public and don’t add a README file.
  4. Click Create Repository.


  1. Go to < > Code in the left-top banner.
  2. Also click < > Code button in the right-top.
  3. Copy the HTTPS URL, which would say [https://github.com/username/practice.git].


  1. Back to the terminal, write git remote add origin https://github.com/username/practice.git.
  2. Finally, write git push -u origin main. This means that we’re pushing our local repo to the origin repo’s main branch. Git might require you to insert ID and password. For the password, please insert PAT.
  3. Check the GitHub repo to see your local repo uploaded.

Pushing Cautions

Pushing your local repo to a public repo means you share the items and history of your local repo. This requires a few cautions.

  1. Without inevitable reasons, don’t include data to the git push, especially raw data. This is not only because it takes a lot of space, but most raw data are either publicly available or classified. You can lead the readers to the data website if it’s public in the README.md. If the data is classified, you have a responsibility not to disclose it. To avoid such issues, you can include the raw data files into the .gitignore file and only commit and push selected files of a repo.

  2. Check if your codes include personal information. This could often happen if your code includes personal API keys. If you upload them, you’ll get an email from GitHub that your information is exposed in most cases. However, avoiding is much easier than solving problems.

Publishing via GitHub Pages

  1. Go to the Settings page of the repo in GitHub.
  2. In the left toc, you can find Pages. Click it.
  3. In the Build and Deployment setting, set the source to Deploy from a branch and Branch to main and /(root).
  4. After you build it, go to https://username.github.io/practice/doc/output.html

Cloning a repo

Let’s practice cloning with with package repo of Callaway and Sant’anna(2021).

git clone https://github.com/bcallaway11/did.git

Also add the cloned repo a remote name, so that pull/pushing to the original repo(in GitHub) can be modified. (It is impossible to push to the original repo without the author’s permission, so just take it as an example.)

git remote add origin https://github.com/bcallaway11/did.git # adds the did package repo and alias it as origin
git remote -v

Pulling

Having a cloned version of the did package, you can look into the repo in your local computer. However, the author might make changes to package. You can pull the changes via git pull.

Fetch + Merge

  • git fetch remote_name: fetches a Git remote with the specified name into the current local repo
  • git fetch remote_name branch_name: fetches a Git remote’s branch with the specified remote and branch name
  • git merge remote_name branch_name: merges the local repo to the remote’s branch with the specified remote and branch name

Pulling

  • git pull remote_name branch_name: fetches and merges the remote to the local
cd did
git pull origin master

Why not clone again?

Cloning is copying the entire repo. Pulling is fetching and merging a ‘branch’ out of a remote repo. May be now is the right time to discuss branches.

Brief Intro to Branches

Image Source

git branch #displays which branches exist in the repo(* denotes the current branch)
git branch branch_name #creates a branch
git branch -m old_name new_name #renames a branch
git switch branch_name #moves to the branch
git diff branch1 branch2 #displays diffs between branches
git merge source_b destination_b #merges one branch to the other

Branches enable you to isolate changes for specific features, bug fixes, or experiments without affecting the main codebase(usually the main or master branch). By creating branches, teams can work on multiple tasks concurrently, merging them back into the main branch once they are complete and tested.

One can think of the main branch as the ground truth. Each branch exists for a specific task, and once the task is complete and the process is confirmed to be ground true, it is then merged to the main branch.

General Tips

General

  • Make codes and repo reproducible. Learn how to cleanly code.
    • Remember that you tomorrow is a new reader of your codes.
  • Never forget to commit. Keep a consistent log of your work.
  • In a collaborative environment, notify your collaborators on what you commit to the main branch.

Cleanliness

  • Remember to delete idle branches.
  • Don’t git add . indiscrimminately and try to add files selectively.
  • Don’t add unnecessary large files. Keep them in the .gitignore.

Useful Resources

Websites

Books

  • Pro Git by Ben Straub and Scott Chacon

Example EDA

Return

Personal Access Tokens

To check your Personal Access Token (PAT) in GitHub, follow these steps:

If you cannot find or remember your PAT:

  1. You cannot view its value again.
  2. You will need to generate a new PAT:
  3. Follow the steps in “View Existing Personal Access Tokens” to navigate to the token list.
  4. Revoke the old token if it’s no longer needed or compromised.
  5. Click Generate new token to create a new one.

Personal Access Tokens

Verify PAT is Working.

To check if a PAT is valid:

  1. Use the token in a GitHub API call:
curl -H "Authorization: token YOUR_PERSONAL_ACCESS_TOKEN" https://api.github.com/user

If valid, this returns details about your GitHub user.

  1. Test with Git:
  • Clone a repository:

git clone https://<username>@github.com/<username>/<repo>.git
  • Use the token as the password when prompted.


Tips:

  • Store PAT Securely: Use credential managers, environment variables, or secret management tools to keep it safe.

  • Limit Scopes: Create tokens with the minimal necessary permissions to reduce risks.

  • Use Fine-Grained Tokens: These provide more granular permissions compared to classic tokens.

Return